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Abstract — Molecular recognition, which is essential in 
processing information in biological systems, takes place in a 
crowded noisy biochemical environment and requires the 
recognition of a specific target within a background of various 
similar competing molecules. We consider molecular recognition 
as a transmission of information via a noisy channel and use this 
analogy to gain insights on the optimal, or fittest, molecular 
recognizer. We focus on the optimal structural properties of the 
molecules such as flexibility and conformation. We show that 
conformational changes upon binding, which often occur during 
molecular recognition, may optimize the detection performance of 
the recognizer. We thus suggest a generic design principle termed 
'conformational proofreading' in which deformation enhances 
detection. We evaluate the optimal flexibility of the molecular 
recognizer, which is analogous to the stochasticity in a decision 
unit. In some scenarios, a flexible recognizer, i.e., a stochastic 
decision unit, performs better than a rigid, deterministic one. As a 
biological example, we discuss conformational changes during 
homologous recombination, the process of genetic exchange 
between two DNA strands. 

Keywords — Molecular information channels, molecular 
recognition, conformational proofreading. 



I. Introduction 

Molecular recognition plays a key role in processing 
information in biological systems. Processes such as decoding 
DNA sequences by regulatory proteins, identification of 
antigens by antibodies and signal transduction by enzymes lean 
on the ability of molecules to recognize and bind a specific 
target. However, the crowded biological environment contains 
many molecules with similar structure that may compete with 
the "right" target. Moreover, recognition is often carried out 
using non-covalent binding energies that are of the same order 
as the stochastic thermal energy [1]. Thus, the recognition 
process is prone to false binding, which introduces errors and 
may impair the proper information flow. As a result, noise must 
be taken into account in the design and the evolution of 



molecular information channels and, specifically, molecular 
codes [2, 3]. 

The remarkable efficiency and specificity of molecular 
recognition led Emil Fisher already in 1890 to propose the 
'Lock and Key' model which postulates that an enzyme and a 
substrate should be complementary in shape and thus 
discriminate against other substrates that do not fit to enzyme 
binding site [Fig. 1(a)]. Later, studies indicated that the 
'native' conformations of many molecular recognizers and 
targets may not be exactly complementary ('Induced fit' [4]). 
Therefore, they may deform in order to bind each other and 
conformational changes occur upon binding [5-9]. 

Unlike the picture that arises from the 'Lock and Key' 
model, the interacting molecules are not always rigid objects 
that are complementary in structure. In the noisy biological 
environment, the molecular recognition process may involve 
molecules that fluctuate around non-complementary native 
conformations. As a result, a variety of complexes may be 
formed upon binding [Fig. 1(b)]. This leads to the question of 
whether conformational changes upon binding are merely a 
biochemical constrain or whether they are the outcome of an 
evolutionary pressure to optimize molecular recognition. 

In this work, we draw the analogies between molecular 
recognition and transmission of information via a noisy channel 
and discuss the optimal design of a molecular recognizer [10, 
11]. Our analysis show that the optimal design depends on the 
flexibility of the molecules. For typical biological values of 
flexibility, conformational changes upon binding may optimize 
the detection performance of the recognizer. The flexibility of 
the recognizer is analogous to stochasticity in the decision unit. 
Soft molecules with high flexibility fluctuate more than 
molecules with lower flexibility and thus are more 'noisy'. In 
some cases a flexible recognizer, a stochastic decision unit, 
performs better than a rigid, deterministic, one. As a biological 
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Fig. 1. Models of molecular recognition, (a) The 'Lock and Key' model. The bio-recognizer a has to discriminate between two competing targets A and B. The 
binding sites of the recognizer and its correct target A are complementary. Thus, their binding is tighter relative to the binding of the recognizer with an incorrect 
target B that has a non-complementary binding site, (b) Conformational changes upon binding. Both the recognizer a and its target A are fluctuating within an 
ensemble of conformations (only two are shown in the figure) and as a result a variety of complexes may be formed upon binding, (c) The typical recognition 
reaction can be described using Michaelis-Menten kinetics. In the first reversible step, the recognizer collides with a target and may bind it to form a complex. The 
stability of the formed complexes, aA or aB, depends on the dissociation constants, K A — \a\\A\l[aA~\, K B = [a][B]/[afi] where [] denotes concentration. In the second, 
irreversible step, the bound complexes initiate some function with a rate Va, in the case of aA, and with a rate v B in the case of aB. (d) Formulation of molecular 
recognition as a signal detection problem. The input signals are the target molecules A and B while the output is whether A or B are detected. In this analogy, the 
detection unit is the recognizer a and molecular binding governs the detection efficiency. Since all the molecules fluctuate within an ensemble of conformations, a 
noise should be added to the input signals and to the detection unit. 



case study, we discuss conformational changes during 
homologous recombination, an all-important process that 
facilitates the exchange of genetic material between 
homologous DNA molecules and their possible impact on 
correct detection [12]. 

II. Molecular recognition as a signal detection 

PROBLEM 

Let us consider a recognition process in which a recognizer 
a, for example an antibody, has to discriminate between two 
competing targets, A, a harmful pathogen, and B, a harmless 
self molecule [Fig. 1(c)]. The goal of the recognizer in this case 
is to trigger an immune response when it recognizes A, 
However, binding the harmless molecule B may lead to an 
undesirable auto-immune response. Molecular recognition 
processes typically follow Michaelis-Menten kinetics [13]. In 
the first reversible step, the recognizer collides with a target 
and may bind it to form a complex. The stability of the formed 
complexes, aA or aB in our case, depends on the dissociation 
constants, K A = [a][A]/[aA], K B = [a][fl]/[afi], where [] denotes 
concentration. The dissociation constant is related to the free 
energy difference between the unbound state (free molecules) 
and the bound state (complex) measured in units of k B T (AG > 
0), K ~ exp(-AG). A complex with low free energy has a large 
AG and a small K and thus is more stable. In the second, 



irreversible step, the bound complexes initiate a response, in 
our example an immune response, with a rate v A , in the case of 
aA, and with a rate v B in the case of aB. 

This recognition process can be treated as a signal detection 
problem [Fig. 1(d)]. The input signals are the target molecules 
A and B, while the output is whether to trigger or not to trigger 
an immune response (Table I). In this analogy, the detection 
unit is the recognizer a. Molecular binding, affected by the 
physical constrains on the molecules, governs the detection 
efficiency. Since all the molecules fluctuate within an ensemble 
of conformations, a noise should be added to the input signals 
and to the detection unit. The flexibility of the molecules 
affects their stochastic fluctuations and the noise associated 
with them. For example, a rigid recognizer does not fluctuate 
and thus can 'sample' only one conformation in a deterministic 
fashion. A flexible recognizer interconverts between an 
ensemble of conformations and therefore samples the 
conformations of its target in a stochastic fashion. We employ 
detection theory in order to evaluate the optimal 
conformational mismatch between a recognizer and its target 
and the optimal flexibility of these molecules. 
A standard measure for the quality of detection systems is the 
Bayesian decision rule [14]. This rule is obtained by 
minimizing the Bayesian cost function, Cj, using posteriori 
probabilities, 



Table I 

Example for a decision table of molecular recogntion. An antigen 

a has to discriminate between a pathogen^ and a harmless 
molecule B. The decision is made by binding between the antigen and 

THE TAREGTS. AS THE ANTIGEN BINDS ONE OF THE TARGETS, AN IMMUNE 

RESPONSE MAY BE TRIGGERED. THE TABLE SPECIFIES THE POSSIBLE DECISIONS 

AND THEIR PROBABILITIES. p h IS THE PROBABILITY THAT THE ANITGEN 

ENCOUNTERS A OR B WHILES IS THE CONDITIONAL DECISION PROBABILITY 

GIVEN THE INPUT. 



\^ Input 
Decision^^. 


Pathogen A 
Ph(4) 


Harmless Molecule B 
Pi,(B) 


Trigger 
response (t) 


True positive: a binds A 

PAAA) 


False positive: a binds B 
PAAB) 


Do not trigger 
Response (nt) 


False negative 
\-Pj[t\A) 


True negative 
l-Pj(t\B) 



c b=Yj C iiPhU)pA i \J), 



(1) 



where /?/,(/) is the probability for an input j to occur and Pd(i\j) is 
the conditional probability for an output i given the input was j 
(Zjpjj]/) - 1). Cy is the cost assigned for such a decision and 
measures the impact of each decision on the system. In the case 
of molecular recognition, the inputs are the encounters of the 
recognizer with A or B, denoted by sub-indices A and B, and 
the decisions are to trigger or not to trigger the response 
associated with A, denoted by sub-indices t and nt. 

Minimizing the Bayesian cost function (1) amounts to 
minimizing [11] 



C = -c 



■p h (A)-p d (t\A) 



■p h {B)-p d {t\By 



(2) 



where c corr ect ~ C„,_a ~ C t j and Ci ncorr ect ~ Qs — C„ ti g. Since the 
weights of the correct decisions, C, ?A and C„, iB , are negative and 
the weights of the false decisions, C, ?B and C„, A , are positive, 
both c correct and c incorrect are positive. Thus, increasing the 
conditional true positive probability Pd(t\A), reduces the cost 
while increasing the conditional false positive probability 
Pd{t\B), increases the cost. At the molecular level, p h is the 
probability that the recognizer encounters one of the targets. 
The conditional detection probability, p d , is the product of the 
binding probability between the molecules and the probability 
that the formed complex is functional, pf, p d -p b - pf. Therefore, 
the cost function (2) takes the form, 



C = ~c correcl ■ p h (A) ■ PM) ■ pMA) 



■ Ph (B)-p h (B)-p f (aB). 



(3) 



In this form, the cost function has a clear biological meaning. 
The first term of (3) is proportional to the rate of correct 



function induction, whereas the second term is proportional to 
the rate of incorrect function induction. 

The binding probability between the corresponding targets 
can be estimated by pt - 1/[1 + exp(-AG,)], where AG, is the 
total free energy difference between the unbound and bound 
states and is measured in units of k B T (AG, >0). As the 
recognizer and its target come into close contact they may 
deform in order to align their binding sites [Fig. 1(b)]. This 
deformation requires the investment of free energy, AGj e f. 
Once the binding sites are aligned, binding free energy, AG in „ 
is gained. The total free energy difference is thus, AG, - AG,„, 
- kG def . 

By introducing the binding probability, p h , into the 
detection cost C (3), we obtain a measure for the quality of 
detection as a function of the interaction and deformation free 
energies, 

C 



-1 



l + exp(AG def -AG intA ) 

+ t-- 



1 



(4) 



\ + exp(AG def -AG mtB ) 



where the sub-indices A and B denote the free energies of the 
correct and incorrect targets, respectively. The binding 
interactions with the correct target are stronger than the binding 
interactions with the incorrect one, AG,„,a > AG in , B . For clarity 
we have assumed that the binding sites of the competing targets 
are similar and thus the deformation free energy required to 
bind them is almost the same. The first term of (4) is the benefit 
from detecting the correct target A, increasing true positives 
and decreasing false negatives, whereas the second term is the 
penalty for detecting the incorrect target B, increasing false 
positives and decreasing true negatives. The parameter t — 
[Ci„co r rectPh(B)p^aB)]/[c correc ,p h (A)p^aA)] is the tolerance of the 
system. As t increases, the penalty for detecting an incorrect 
target increases and the system is less tolerant of errors. For 
example, this may occur when the penalty for an incorrect 
decision, c,„ correc ,, is much higher than the benefit from a correct 
one, c correc ,, or when incorrect targets are more abundant than 
correct ones. In a similar fashion, an error-tolerant system is 
characterized by a lower value of t. 

The optimal deformation free energy is the outcome of a 
tradeoff between maximizing the probability of binding to the 
correct target, while minimizing the probability of binding to 
the incorrect one. If the optimal deformation free energy value 
is zero, no deformation is needed and the native states of the 
molecules should be complementary ('locks and key'). 
However, as we show below, if the optimal deformation free 
energy is not zero, the optimal recognizer should not be 
complementary to its target and thus conformational changes 
occur upon binding. 

The binding probability as a function of the deformation 
free energy is a sharp sigmoid [Fig. 2(a)]. As the deformation 
free energy increases, the correct binding probability 
decreases. The correct binding probability reaches half its 
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Fig. 2. (a) The cost function and its dependence on the deformation free energy for the symmetric case in which the impact of correct and incorrect decision is equal, 
t = 1. AGi„i,A and AG,„,b are the free energy gains due to binding with the correct or incorrect target, respectively, such that AG,, a a > AG,„,,b. The correct binding 
probability is a sharp sigmoid that reaches half its maximal value at AG*/ ~ AGmj. The incorrect binding probability decreases more steeply than the correct one and 
reaches half its maximal value at AGdef ~ AG„„b. Thus, the cost function exhibits a minimum at a non-zero value, AG*/ = l/2( AGj„ t j + AG„„b). This implies that the 
recognizer and its target should not have complementary structures and thus conformational changes should occur upon binding, (b) The effect of the tolerance on 
the cost. In a system with low error-tolerance, t> 1 , the penalty for binding a incorrect target is higher than the gain from binding a correct target. Small deformation 
is less beneficial since it allows more incorrect binding and thus the optimal deformation free energy is slightly shifted to larger values relative to t = 1 . Similarly, the 
opposite phenomenon occurs at high error-tolerance, t < 1, and the deformation free energy is shifted to lower values. In the figure AG„,,« /AGj nt j = 1/2. 



maximal value at AG, 



def 



AG,-, 



At the same time, the 



probability of incorrect binding also decreases and reaches 



half its maximal value at AG 



def 



AG,„,n. The reduction of the 



incorrect binding probability is steeper than the correct 
binding probability. Therefore, for the symmetric case in 
which the impact of correct and incorrect decision is equal, t — 
1, the cost function exhibits a minimum at a non-zero value, 
AG def = l/2( AG int:A + AG,„ t , B ). 

The effect of tolerance on the cost C is shown in Fig. 2(b). 
The tendency of a system with low error-tolerance, t > 1, is to 
minimize the incorrect binding at the expense of correct 
binding. In such a system, the penalty for binding an incorrect 
target is higher than the gain from binding a correct target. A 
small deformation is less beneficial since it allows more 
incorrect binding and therefore the optimal deformation free 
energy is shifted to larger values relative to t — 1. Similarly, 
the opposite phenomenon occurs at high error-tolerance, t < 1, 
and the deformation free energy is shifted to lower values. 

III. Optimal conformational mismatch and 

FLEXIBILITY 

So far, we have discussed the optimal deformation free 
energy without accounting for the physical properties of the 
molecules such as conformation and flexibility. In order to 
evaluate the role of these parameters, their effect on the 
deformation energy must be specified. Modeling proteins as 
elastic networks was previously applied to study the 
fluctuations of proteins and to predict domain deformation 
upon binding [15-18]. 

Motivated by these studies, we treat the molecules as elastic 



networks and assume that the deformation energy is E — 
(l/2)kd 2 , where k is an effective spring constant and d is the 
structural mismatch between the native structures of the 
molecules. The mismatch, d, may be a length difference if the 
deformation is an extension or an angle difference if the 
deformation is a bending. Thus, the distribution of 
conformations is a Gaussian centered around the native 
conformation of the molecules with a variance a ~ Ilk 1 2 . Once 
the binding sites are aligned, binding interaction energy E int is 
gained. 

Using straightforward statistical mechanics calculation 
[1 1], the cost function (4) can now be expressed as 



C = 



-1 



i + s . e -^J_j*y2 )d > 



+t 



(5) 



1 , - E i„tB 1 (k„'2)d 2 

1 + 5 -e ' , — e B 



where kA/s is the harmonic mean of the recognizer and the 

target spring constant, k A /B =k a k A/B I (k a +k A/B ) . E intA and 

E inlB are the binding energy gains due to interactions between 
the recognizer and the correct or incorrect target, respectively, 
such that E inuA > E intB . The parameter s is a measure for the 
kinetic phase space of the system [11] and does not depend on 
the structural parameters of the molecules. We can now 



















Optimal recognizer - ■ - • 




Flexible 
recognizer 


Zero mismatch 
-~J^^"- C - phase 




K a 


Nonzero mismatch >. 

phase ^v. 




Rigid 
recognizer 






1 
Noisy k A /k B Noisy 






correct target incorrect target 





\\og(klk c ) 



opt 



(7) 



Fig. 3. Optimal design phases. There are two design phases for the optimal 
recognizer. In one, the native state structure of the recognizer should be 
complementary to the native state structure of its correct target, hi the other, 
there is a structural mismatch between the native states and thus 
conformational changes occur upon binding. The optimal mismatch depends 
on the flexibility of the recognizer and its targets. For targets with similar 
flexibility, k A Ik B = 1 , the design depends on whether the spring constant of the 
recognizer is above or below the critical one, k ac . For a noisy incorrect target, 
k A /k B > 1, the optimal design is a flexible recognizer with k a ~ k ac and a zero 
mismatch. For a noisy correct target, k A /k B < 1, the optimal design is a rigid 
recognizer with a non-zero mismatch. Typical biological flexibilities 
correspond to k„ > k clc and thus both designs may be beneficial. 

minimize C (5) and obtain the structural parameters that 
optimize detection [Fig. 3]. Our analysis shows that there are 
two optimal design "phases". In one phase, the native states of 
the optimal recognizer and the correct target are 
complementary, that is, the optimal mismatch is zero, d — 0. In 
the other phase, the native states are not complementary, d ^ 0, 
and, thus, conformational changes upon binding are beneficial. 
If the targets have similar flexibility, k A = k B — k t , as the 
flexibility of the recognizer, k a , is varied, the system 
undergoes a phase transition between the zero mismatch and 
the non-zero mismatch phases [11]. In the symmetric case, t 
=1, the critical spring constant at this transition is, 



*.-** 



K, k. 



(6) 



where k c - s exp(E intA + E intB ). For a very flexible recognizer, 
K < ka,e> tne deformation energy is much smaller than the 
binding energy gain of both targets. Thus, introducing a 
mismatch does not provide any benefit. This is also true if the 
targets are very flexible, k, <k c . However, above the critical 
flexibility, k a > k ac , the reconizer is more rigid and thus 
deformation can occur upon binding to the correct target, 
while it is not likely to occur upon binding to the incorrect 
target. Thus, the optimal mismatch has a nonzero value, 



As the recognizer becomes more rigid, the optimal mismatch 
decreases. Yet, the optimal mismatch for a rigid recognizer is 
still nonzero, d opt (k a — > oo) = {\og(k t lk c )lki) m . 

When the tolerance is asymmetric, t ^ 1, the system still 
undergoes a phase transition but the values of the critical 
parameters change. As the tolerance of the systems to errors is 
reduced (t > 1), avoiding a wrong decision is more beneficial 
than making the correct one. As a result, the critical spring 
constant is lower. For a system with high tolerance (t < 1), the 
priority is the formation of a correct product and thus the 
critical spring constant is higher. 

When the incorrect target is more flexible, that is fluctuates 
more, than the correct one, k A > k B , the ensemble of incorrect 
target conformations is more "spread" than the correct 
ensemble. The optimal design is a flexible recognizer, k a ~ k ac , 
with zero mismatch that can sample many correct 
conformations while sampling only few incorrect ones. In 
other words, a stochastic decision unit will perform better than 
a deterministic one. In the case where the correct target is 
noisier, k A < k B , the ensemble of correct target conformations 
is more spread than the incorrect ensemble. Thus, the optimal 
design is a rigid recognizer with a nonzero mismatch relative 
to the main target. 

IV. Discussion 

By applying the framework of signal detection to molecular 
recognition, we gain a quantitative insight on the optimal 
design of molecular recognizers in a noisy biochemical 
environment. The "phases" of optimal design depend on 
whether the recognizer flexibility is above or below a critical 
one [Fig. 3]. Typical structural parameters and interactions of 
biomolecules indicate that both design strategies, with or 
without a mismatch, are relevant for molecular recognition 
scenarios. If the competitor is noisier than the correct target, 
the optimal design is a recognizer with a critical flexibility and 
a zero mismatch. In this case, a noisy recognizer samples the 
target more efficiently than a deterministic, rigid, recognizer. 
If the correct target is noisier, the optimal design is a rigid 
recognizer with a non-zero mismatch which is similar to a 
rigid key that does not fit exactly into its lock. 

The result that conformational changes may provide 
optimal recognition, may explain their abundance in nature as 
a mechanism that increases the fitness of the recognition 
process. We thus suggest a general design principle, termed 
conformational proofreading, in which the introduction of a 
structural mismatch between the recognizer and its target 
enhances the quality of detection. Besides rationalizing 
observed natural processes, this kind of formalism may be 
used in the design of future synthetic biological recognition 
systems. 

More realistic and empirical examples can be incorporated 
into this framework in order to evaluate their optimal design. 
Such an example is the process of homologues recombination 



in which homologous DNA segments can be exchanged. 
Correct homologous recombination requires the detection of a 
specific homologous DNA sequence within a huge variety of 
heterologous sequences. During this process it is known that 
one of the DNA segments is deformed leading to a large 
mismatch relative to the target DNA. We suggest that this 
deformation can be explained using signal detection approach 
[12]. 
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